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Abstract 

Given a family of probability measures in P(A'), the space of probability mea¬ 
sures on a Hilbert space X, our goal in this paper is to highlight one ore more 
curves in P(A') that summarize efficiently that family. We propose to study this 
problem under the optimal transport (Wasserstein) geometry, using curves that 
are restricted to be geodesic segments under that metric. We show that concepts 
that play a key role in Euclidean PCA, such as data centering or orthogonality of 
principal directions, find a natural equivalent in the optimal transport geometry, 
using Wasserstein means and differential geometry. The implementation of these 
ideas is, however, computationally challenging. To achieve scalable algorithms 
that can handle thousands of measures, we propose to use a relaxed definition 
for geodesics and regularized optimal transport distances. The interest of our ap¬ 
proach is demonstrated on images seen either as shapes or color histograms. 


1 Introduction 


Optimal transport distances ( |Villani[ |2008| ), a.k.a Wasserstein or earth mover’s distances, define 
a powerful geometry to compare probability measures supported on a metric space T'. The 
Wasserstein space P(T')—the space of probability measures on A' endowed with the Wasserstein 
distance—is a metric space which has received ample interest from a theoretical perspective. Given 
the prominent role played by probability measures and feature histograms in machine learn ing, the 
properties of P(T') can also have practical implications in data science. This was shown byjAgueh 
and Carlier| ( |2011| ) who described first Wasserstein m eans of probability mea sures. Wass erstein 
means have been recently us ed in Bayesian infere nce ( [Srivastava et al. 2015), clustering ( [Cuturi 
and Doucet 2014| ), graphics ( [Solomon et al.|[MT5 ) or brain imaging ( Gramfort et al.[ [MH] ). When 
X is not just metric but also a Hilbert space, P(A') is an infinite-dimensional Riemannian manifold 
( Ambrosio et al.|2006| Chap. 8 ; [Villani|2008[ Part II). Three recent contributions by 


(2015| §5.2), [Bigot et al.[([2015[) and[Wang et ain([2013]) exploit directly or indirectly t 

- _ j - - 


Boissard et al. 


his structure to 


extend Principal Component Analysis (PCA) io P[X). These important seminal papers are, how¬ 
ever, limited in their applicability and/or the type of curves they output. Our goal in this paper is to 
propose more general and scalable algorithms to carry out Wasserstein principal geodesic analysis 
on probability measures, and not simply dimensionality reduction as explained below. 


Principal Geodesics in P{X) vs. Dimensionality Reduction on P{X) We provide in Fig. [^a 
simple example that illustrates the motivation of this paper, and which also shows how our approach 
differentiates itself from existing dimensionality reduction algorithms (linear and non-linear) that 
draw inspiration from PCA. As shown in Fig.[2 linear PCA cannot p roduce components that re - 
main in P(/T). Even more advanced tools, such as those proposed by|Hastie and Stuetzle|(l98^, 


fall slightly short of that goal. On the other hand, Wasserstein geodesic analysis yields geodesic 
components mP{X) that are easy to interpret and which can also be used to reduce dimensionality. 


1 























































Figure 1: (top-left) Dataset: 60 x 60 images of a single Chinese character randomly translated, 
scaled and slightly rotated (36 images displayed out of 300 used). Each image is handled as a 
normalized histogram of 3, 600 non-negative intensities, (middle-left) Dataset schematically drawn 
on P(A'). The Wasserstein principal geodesics of this dataset are depicted in red, its Euclidean 
components in blue, and its principal curve ( [Verbeek et al.[ |2002 ) in yellow, (right) Actual curves 
(blue colors depict negative intensities, green intensities > 1). Neither the Euclidean components 
nor the principal curve belong to P(A’), nor can they be interpreted as meaningful axis of variation. 


Foundations of PCA and Riemannian Extensions Carrying out PCA on a family (xi,..., 
of points taken in a space X can be described in abstract terms as: (i) define a mean element x 
for that dataset; (ii) define a family of components in A, typically geodesic curves, that contain x\ 
(Hi) fit a component by making it follow the x^’s as closely as possible, in the sense that the sum 
of the distances of each point Xi to that component is minimized; (iv) fit additional components 
by iterating step (in) several times, with the added constraint that each new component is different 
(orthogonal) enough to the previous components. When X is Euclidean and the x^’s are vectors in 
the (n + l)-th component Vn+i can be computed iteratively by solving: 

N 

Vn +1 ^ argmin min \\xi — (x + where Vq =' 0, and = span{i;i, • • • , Vn}. (1) 

vev^M2=if^i 


Since PCA is known to boil down to a simple eigen-decomposition when X is Euclidean or Hilber- 
tian ( [Scholkopf et al.| |1997| ), Eq. ([T]) looks artificially complica ted. This formulation is, however, 
extremely useful to generalize PCA to Riemannian manifolds ( [Fletcher et al.[ [2004] ). This gen¬ 
eralization proceeds first by rep lacing vector means, lines and orthogonality conditions using re¬ 
spectively |Fric^ means ( |1948| ), geodesics, and orthogonality in tangent spaces. Riemannian PCA 
builds then upon the knowledge of the exponential map at each point x of the manifold X. Each ex¬ 
ponential map exp^ is locally bijective between the tangent space of x and X. After computing 
the Frechet mean x of the dataset, the logarithmic map log^ at x (the inverse of exp^) is used to map 
all data points Xi onto T^. Because is a Euclidean space by definition of Riemannian manifolds, 
the dataset (log^ Xi)i can be studied using Euclidean PCA. Principal geodesics in X can then be 
recovered by applying the exponential map to a principal component v'^, {exp^(ti;*), |t| < e}. 


From Riemannian PCA to Wasserstein PCA: Related Work As remarked by [Bigot et "ST 
( [2015[ ), [Fletcher et "^ s approach cannot be used as it is to define Wasserstein geodesic PCA, be¬ 
cause P{X) is infinite dimensional and because there are no known ways to define exponential 
maps which are locally bijective between Wasserstein tangent spaces and the manifold of probabil¬ 
ity measures. To circumvent this problem, [Boissard et al.[ ( [2015] ), [Bigot et al.[ ( [MT5] ) have proposed 
to formulate the geodesic PCA problem directly as an optimization problem over curves in P(A'). 
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Boissard et al. and [Bigot et al.| study the Wasserstein PC A problem in restricted scenarios: [Bigot 
et al.[ focus their attention on measures supported on A' = M, which considerably simplifies their 


analysis since it is known in tha t case that the Wasserstein space P(M) can be embedded isometri- 
cally in ( 


Boissard et al. 


assume that each input measure has been generated from a single 
template density (the mean measure) which has been transformed according to one “admissible de¬ 
formation” taken in a parameterized family of deformation maps. Their approach to Wasserstein 
PC A boils down to a functional PC A on such maps. [Wang et ah proposed a more general approach: 
given a family of input empirical measures (/ii,..., /i at), they propose to compute first a “template 
measure” fi using /c-means clustering on jii. They consider next all optimal transport plans tt^ 
between that template jl and each of the measures iii, and propose to compute the barycentric pro¬ 
jection (see Eq.[^ of each optimal transport plan tt^ to recover Monge maps T^, on which standard 
PCA can be used. This approach is computationally attractive since it requires the computation of 
only one optimal transport per input measure. Its weakness lies, however, in the fact that the curves 
in P(A') obtained by displacing fl along each of these PCA directions are not geodesics in general. 


Contributions and Outline We propose a new algorithm to compute Wasserstein Principal 
Geodesics (WPG) in P(A') for arbitrary Hilbert spaces A’. We use several approximations—both 
of the optimal transport metric and of its geodesics—to obtain tractable algorithms that can scale 
to thousands of measures. We provide first in ^a review of the key concepts used in this paper, 
namely Wasserstein distances and means, geodesics and tangent spaces in the Wasserstein space. 
We propose in ^to parameterize a Wasserstein principal component (PC) using two velocity fields 
defined on the support of the Wasserstein mean of all measures, and formulate the WPG problem 
as that of optimizing these velocity fields so that the average distance of all measures to that PC 
is minimal. This problem is non-convex and non-smooth. We propose to optimize smooth upper- 
bounds of that objective using entropy regularized optimal transport in Q The practical interest of 
our approach is demonstrated in ^on toy samples, datasets of shapes and histograms of colors. 

Notations We write (A, 5) for the Frobenius dot-product of matrices A and B. D(i 4 ) is the 
diagonal matrix of vector u. For a mapping f : y y,wQ say that / acts on a measure /i G P{y) 
through the pushforward operator # to define a new measure /#/i G P{y)- This measure is 
characterized by the identity (/#/i)(5) = /i(/“^(5))for any Borel set 5 C y. We write pi and 
P 2 for the canonical projection operators A, defined as pi {xi , X 2 ) = xi andp 2 (^ 1 , ^ 2 ) = ^ 2 . 


2 Background on Optimal Transport 


Wasserstein Distances We start this section with the main mathematical object of this paper: 
Definition 1. {Villani 2008 Def. 6.1) Let P{A) the space of probability measures on a Hilbert 


space A. Let n(z^, rj) be the set of probability measures on with marginals v and rj, i.e. pi#7r = 

u andp2#7r = rj. The squared 2-Wasserstein distance between v and 77 in P{A) is defined as: 




inf 

Ten(zy,?7) 


/ 


Ik - y\\l:dTT{x,y). 


( 2 ) 


Wasserstein Bar ycenters Given a family of N probability measures (/ii, • • • , /xat) in P{A) and 
weights A G Agueh and Carlier (2011) define fl, the Wasserstein barycenter of these measures: 


N 

fl G argmin u). 

veP{x) 


Our paper relies on several algorithms which have been recently proposed ( [Benamou et al. 2Q15t 
Bonneel et al. 2015t Cuturi and Doucet[[2Q14[ ?) to compute such barycenters. 


Wasserstein Geodesics Given two measures u and 77 , let n*(z^, 77 ) be the set of optimal couplings 
for Eq. Informally speaking, it is well known that if either or 77 are absolutely continuous 
measures, then any optimal coupling tt* G n*(z/, 77 ) is degenerated in the sense that, assuming for 
instance that u is absolutely continuous, for all x in the support of u only one point 77 G A is 
such that (i 7 r*(x, y) > 0. In that case, the optimal transport is said to have no mass splitting, and 
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there exists an optimal mapping T \ X ^ X such that tt* can be w ritten, usin g a pushfor ward, a s 
TT* = (id X T)^u. When there is no mass splitting to transport u to [McCann s interpolant (1997): 


gt = {{l-t)id + tT)#v, iG[0,1], (3) 

defines a geodesic curve in the Wasserstein space, i.e. {gt)t is locally the shortest path between 
any two measures located on the geodesic, with respect to W 2 . In the more general case, where no 
optimal map T exists and mass splitting occurs (for some locations x one may have d7r*(x, ^) > 0 
for several y), then a geodesic can st ill be defined, but it r elies on the optimal plan tt* instead: 
Qt = ((1 — t)pi + tp2)#7r'^, t e [0,1], ( Ambrosio et al. 2006 §7.2). Both cases are shown in Fig.|^ 



Figure 2: Both plots display geodesic curves between two empirical measures u and y on An 
optimal map exists in the left plot (no mass splitting occurs), whereas some of the mass of u needs 
to be split to be transported onto y on the right plot. 


Tangent Space an d Tangent Vectors W e briefiy describe in this section the tangent spaces of 
P{X), and refer to ( [Ambrosio et al. 2006 Chap. 8) for more details. Let p : I C R ^ 
be a curve in P{X). For a given time t, the tangent space of P{X) at pt is a subset of L‘^{pt^X), 
the space of square-integrable velocity fields supported on Supp{pt). At any t, there exists tangent 
vectors Vt in L‘^{pt^X) such that lim/^^o W2{pt-\-h^ (id + hvt)#pt)/\h\ = 0. Given a geodesic 
curve in P{X) parameterized as Eq. ([^, its corresponding tangent vector at time zero is = T — id. 


3 Wasserstein Principal Geodesics 


Geodesic Parameterization The goal of principal geodesic analysis is to define geodesic curves 
in P{X) that go through the mean p and which pass close enough to all target measures pi. To that 
end, geodesic curves can be parameterized with two end points u and y. However, to avoid dealing 
with the constraint that a principal geodesic needs to go through p, one can start instead from p, and 
consider a velocity field v G L‘^{p^X) which displaces all of the mass of p in both directions: 


gt{v) = (id + tv) #//, te [-1,1]. 


(4) 


Lemma 7.2.1 of Ambrosio et al.[ ( [^06[ ) implies that any geodesic going through p can be written 
as Eq. 0. Hence, we do not lose any generality using this parameterization. However, given an 
arbitrary vector field v, the curve {gt{v))t is not necessarily a geodesic. Indeed, the maps id±v are 

not necessarily in the set Cfi = {r G L‘^ {p, X)\{idx r)^p G IP{p, r^p)} of maps that are optimal 
when moving mass away from p. Ensuring thus, at each step of our algorithm, that v is still such 
that {gt{v))^ is a geodesic curve is particularly challenging. To relax this strong assumption, we 
propose to use a gener alized formulation of ge odesics, which builds upon not one but two velocity 
fields, as introduced by [Ambrosio et al.[ ( [2006| §9.2): 

Definition 2. (adapted from ^mbrosio et al. 2006 §9.2)) Let a, u, y E P{X), and assume there 
is an optimal mapping from a to o and an optimal mapping from a toy. A generalized 
geodesic, illustrated in Fig. ^between v and y with base a is defined by, 


9t 


= ((1 - #(T, t e [ 0 ,1], 


Choosing p as the base measure in Definition [^ and two fields vi,V2 such that id — id + V2 are 
optimal mappings (in C^), we can define the following generalized geodesic gt{vi,V 2 )'. 

9t{vi,V2) = (id-t>i +t{vi +V 2 ))#p, fort € [0,1]. (5) 
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Generalized geodesics become true geodesics when vi and V 2 are positively proportional. We can 
thus consider a regularizer that controls the deviation from that property by defining Vt{vi^V 2 ) = 
{{vi,V 2 )L^fi,x) - \\vi\\mp.,x)\\v 2 \\L 2 {fi,x))'^, which is minimal when vi and V 2 are indeed posi¬ 
tively proportional. We can now formulate the WPG problem as computing, for n > 0, the (n -f 1)* 
principal (generalized) geodesic component of a family of measures by solving, with A > 0: 


N 

min An(ui,U2) -fY] min 1^2^ {gt{vi,V2), l^i),s.t 

vuV2eL^(g,X) 


id — , id -f 'i;2 G , 

■Cl-f V2 G span({v^*^ + v^2^}i<n)-^ ■ 


( 6 ) 


This problem is not convex in vi, V 2 . We pro¬ 
pose to find an approximation of that minimum 
by a projected gradient descent, with a projec¬ 
tion that is to be understood in terms of an al¬ 
ternative metric on the space of vector fields 
To preserve the optimality of the 
mappings id — vi and id + V 2 between itera¬ 
tions, we introduce in the next paragraph a suit¬ 
able projection operator on T"). 

Remark 1. A trivial way to ensure that 
is geodesic is to impose that the vector field v is 
a translation, namely that v is uniformly equal 
to a vector r on all of Supp{fl). One can show 
in that case that the WPG problem described 
in Eq. outputs an optimal vector r which is 
the Euclidean principal component of the fam¬ 
ily formed by the means of each measure p^i. 



Figure 3: Generalized geodesic interpolation be¬ 
tween two empirical measures u and p using the 
base measure a, all defined on W = 


Projection on the Optimal Mapping Set We use a projected gradient descent method to solve 
Eq. @ approximately. We will compute the gradient of a local upper-bound of the objective of 
Eq. @ and update vi and V 2 accordingly. We then need to ensure that vi and V 2 are such that id — vi 
and id + V 2 belong to the set of optimal mappings C^. To do so, we would ideally want to compute 
the projection r 2 of id -b V 2 in 


r 2 =argmin ||(id-f V 2 ) - 

'f'^LCn 


(7) 


to update V 2 ^ r 2 — id. Westdickenberg ( 2010| ) has shown that the set of optimal mappings is a 
convex closed cone in T'), leading to the existence and the unicity of the solution of Eq. Q. 
However, there is to our knowledge no known method to compute the projection r 2 of id + ^’ 2 - 
There is nevertheless a well known and efficient approach to find a mapping r 2 in which is close 
to id V 2 . That approach, known as the the barycentric projection, requires to compute first an 
optimal coupling tt* between ft and (id V 2 )fifi, to define then a (conditional expectation) map 


T^^{x)= [ yd7r'^{y\x). ( 8 ) 

_ _ 

[Ambrosio et al.| ( [20Q^ Theorem 12.4.4) or |Reich| ( |2013| Lemma 3.1) have shown that is indeed 
an optimal mapping between fl and fifl. We can thus set the velocity field as V 2 ^ — id to 

carry out an approximate projection. We show in the supplementary material that this operator can 
be in fact interpreted as a projection under a pseudo-metric GWji on I/^(/i, W). 


4 Computing Principal Generalized Geodesics in Practice 

We show in this section that when A' = the steps outlined above can be implemented efficiently. 

Input Measures and Their Barycenter Each input measure in the family {pi , • • • , /i^v) is a finite 
weighted sum of Diracs, described by rii points contained in a matrix Xi of size d x rii, and a (non¬ 
negative) weight vector of dimension rii summing to 1. The Wasserstein mean of these measures 
is given and equal to p = "f2k=i where the nonnegative vector b = ( 61 , • • • ,bp) sums to one, 

and Y = [^ 1 , • • • ,yp] G is the matrix containing locations of fi. 
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Generalized Geodesic Two velocity vectors for each of the p points in ft are needed to pa¬ 
rameterize a generalized geodesic. These velocity fields will be represented by two matrices 
= [^ 15 ' • • 5 and V 2 = [^15 • • • Assuming that these velocity fields yield optimal 

mappings, the points at time t of that generalized geodesic are the measures parameterized by 

p 

9 t{Vi,V 2 ) = Y, with locations Z* = [z*,..., z*j = Y - Vi + t(Vi + V 2 ). 

k=l 


The squared 2-Wasserstein distance between datum pi and a point V 2 ) on the geodesic is: 

Wi{gt{Vl,V 2 ),^ii)= min {P,Mz,x,), (9) 

PeU(b,ai) 

where U(h^ai) is the transportation polytope {P G Pln^ = b^P^lp = a^}, and Mz^Xi 

stands for the p x rii matrix of squared-Euclidean distances between the p and rii column vectors of 
Zt and Xi respectively. Writing Zt = Zt) and = , we have that 

Mz,x, = ztll + Ipxf - 2ZjXi G 

which, by taking into account the marginal conditions on P G U{b,ai), leads to, 

(P, Mz^Xi) = h^zt + ajxi - 2(P, Z'[Xi ). (10) 


1. Majorization of the Distance of each pi to the Principal Geodesic Using Eq. ( p^ , the dis¬ 
tance between each pi and the PC {gt{Vi,V 2 ))t can be cast as a function /i of (Vi, V 2 )- 


MVuV 2 )= min 
tG [0,1] 


+ af Xi 


+ min -2{P,{Y-Vi+tiVi+V 2 )y Xi) 

P£U(b,ai) 


( 11 ) 


where we have replaced Zt above by its explicit form in t to highlight that the objective above 
is quadratic convex plus piecewise linear concave as a function of t, and thus neither convex nor 
concave. Assume that we are given P^ and that are approximate arg-minima for /i(Vi, V 2 )- For 
any A, P in we thus have that each distance /i(Vi, V 2 ) appearing in Eq. ([^, is such that 


fi (^, B) < (^, B) ^ (P«, Mz^,X ,). (12) 

We can thus use a majorization-minimization procedure ( [Hunter and Lange] [2000 1 to minimize the 
sum of terms ft by iteratively creating majorization functions at each iterate (Vi, V 2 )- All 

functions are quadratic convex. Given that we need to ensure that these velocity fields yield 

optimal mappings, and that they may also need to satisfy orthogonality constraints with respect to 
lower- order principal componen ts, we use gradient steps to update Ui, V 2 , which can be recovered 
using ( jCuturi and Doucet||2014| §4.3) and the chain rule as: 

= 2(t« - l){Zp - A,P«^D(6-i)), = 2t\Zp - X,P«^D(6-i)). (13) 


2. Efficient Approximation of P^ and As discussed above, gradients for majorization func¬ 
tions can be obtained using approximate minima P^ and for each function ft. Because the 

objective of Eq. O is not convex w.r.t. t, we propose to do an exhaustive 1-d grid search with K 
values in [0,1]. This approach would still require, in theory, to solve K optimal transport problems 
to solve Eq. ( [iT] ) for each of the N input measure s. To carry out this step efficiently, we propose 
to use entropy regularized transport ( |Cuturi|[2Q13| ), which allows for much faster computations and 
efficient parallelizations to recover approximately optimal transports P ^. 


3. Projected Gradient Update Velocity fields are updated with a gradient stepsize /3 > 0, 


N 


N 


Vi^Vi- + AViQ ],V2^V2-P[Y + AV 2 O , 


\i=l 


Ki=l 


followed by a projection step to enforce that Vi and U 2 he in span(u/^^ + p ’' , 

in the (/i, X) sense when computing the {n + 1)* PC. We finally apply the barycentric projection 

operator defined in the end of ^ We first need to compute two optimal transport plans, 

P^ G argmin (P, My{y-Vi) ^2 ^ argmin (P, My{y+V 2 ) (14) 

PeU(b,b) Peu(b,b) 

to form the barycentric projections, which then yield updated velocity vectors: 

Ui ^ - {{Y - Vi)PfB{b-^) - y) , V2^{Yp V2)PfB{b-^) - Y. (15) 

We repeat steps 1,2,3 until convergence. Pseudo-code is given in the supplementary material. 
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Experiments 
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Figure 4: Wasserstein mean fi and first PC computed on a dataset of four (left) and three (right) 
empirical measures. The second PC is also displayed in the right figure. 


Toy samples: We first run our algorithm on two simple synthetic examples. We consider re¬ 
spectively 4 and 3 empirical measures supported on a small number of locations m X = so 
that we can compute their exact Wasserstein m eans, using the multi-marginal linear programming 
formulation given in ( |Agueh and Carlier]|2011[ §4). These measures and their mean (red squares) 
are shown in Fig. The first principal component on the left example is able to capture both the 
variability of average measure locations, from left to right, and also the variability in the spread 
of the measure locations. On the right example, the first principal component captures the overall 
elliptic shape of the supports of all considered measures. The second principal component refiects 
the variability in the parameters of each ellipse on which measures are located. The variability in 
the weights of each location is also captured through the Wasserstein mean, since each single line 
of a generalized geodesic has a corresponding location and weight in the Wasserstein mean. 


MNIST: For each of the digits ranging from 0 to 9, we sample 1,000 images in the MNIST 
database representing that digit. Each image, originally a 28x28 grayscale image, is converted into a 
probability distribution on that grid by normalizing each intensity by the total intensity in the im age. 
We compute the Wasserstein mean for each digit using the approach of |Benamou et al.| ( |2015| ). We 
then follow our approach to compute the first three principal geodesics for each digit. Geodesics 
for four of these digits are displayed in Fig.j^by showing intermediary (rasterized) measures on the 
curves. While some deformations in these curves can be attributed to relatively simple rotations 
around the digit center, more interesting deformations appear in some of the curves, such as the the 


loop o n the botto m left of digit 2. Our results are easy to interpret, unlike those obtained with | Wang 
et al.ts approach ( |2Q13| ) on these datasets, see supplementary material. Fig. [^displays the first PC 


obtained on a subset of MNIST composed of 2,000 images of 2 and 4 in equal proportions. 


PCi PC2 PCs 



Figure 5: 1000 images for each of the digits 1,2,3,4 were sampled from the MNIST database. We 
display above the first three PCs sampled at times tk = k/A^ /c = 0,..., 4 for each of these digits. 


Color histograms: We consider a subset of the Caltech-256 Dataset composed of three image 
categories: waterfalls, tomatoes and tennis balls, resulting in a set of 295 color images. The pixels 
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Figure 6: First PC on a subset of MNIST composed of one thousand 2s and one thousand 4s. 


contained in each image can be seen as a point-cloud in the RGB color space [0,1]^. We use /c-means 
quantization to reduce the size of these uniform point-clouds into a set of /c = 128 weighted points, 
using cluster assignments to define the weights of each of the k cluster centroids. Each image can be 
thus regarded as a discrete probability measure of 128 atoms in the tridimensional RGB space. We 
then compute the Wassers tein barycenter of these measures supported on p = 256 locations using 
( [Cuturi and Doucet||2014| Alg.2). Principal components are then computed as described in Q The 
computation for a single PC is performed within 15 minutes on an iMac (3.4GHz Intel Core i7). 
Fig. [^displays color palettes sampled along each of the first three PCs. The first PC suggests that 
the main source of color variability in the dataset is the illumination, each pixel going from dark to 
light. Second and third PCs display the variation of colors induced by the typical images’ dominant 
colors (blue, red, yellow). Fig. [^displays the second PC, along with three images projected on that 
curve. The projection of a given image on a PC is obtained by finding first the optimal time such 
that the distance of that im age to the PC at is minimum, and then by computing an optimal color 
transfer (Pitie et al. 20071 between the original image and the histogram at time C. 



Figure 7: Each row represents a PC displayed at regular time intervals from t = 0 (left) to t = 1 
(right), from the first PC (top) to the third PC (bottom). 





Eigure 8: Color palettes from the second PC (t = 0 on the left, t = 1 on the right) displayed at times 
t = 0, 1. Images displayed in the top row are original; their projection on the PC is displayed 

below, using a color transfer with the palette in the PC to which they are the closest. 

Conclusion We have proposed an approximate projected gradient descent method to compute gener¬ 
alized geodesic principal components for probability measures. Our experiments suggest that these 
principal geodesics may be useful to analyze shapes and distributions, and that they do not require 
any parameterization of shapes or deformations to be used in practice. 

Aknowledgements MC acknowledges the support of ISPS young researcher A grant 26700002. 
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